During first ssh connection, wait for routable ip by maximenoel8 · Pull Request #2070 · uyuni-project/sumaform

maximenoel8 · 2026-03-12T03:50:19Z

Problem

During the provisioning phase of test deployments, OpenTofu/Terraform was intermittently failing with the error:
dial tcp [fe80::...]:22: connect: invalid argument

This occurred because the libvirt_domain resource often reports the IPv6 Link-Local address (fe80::/10) via the QEMU agent before the Global IPv6 or DHCP IPv4 addresses are fully assigned. Since Link-Local addresses are not reachable from Jenkins worker, the connection is failing.

Additionally, the previous logic could potentially pick up IPv4 Link-Local (APIPA - 169.254.0.0/16) addresses, which would lead to connection timeouts.

Why the loop

To resolve the "empty host" race condition without relying on arbitrary sleep timers, a dynamic waiter has been introduced.

Transition to Dynamic Waiting: Replaced the static time_sleep with a local-exec polling loop. This loop queries the Libvirt QEMU agent until a routable (non-link-local) address is detected.
Why we still keep the Regex: While the waiter ensures a routable IP exists, the Libvirt metadata still returns a list containing all detected IPs (including fe80::). The regex remains necessary in the host field to explicitly select the routable address from that list and avoid the "Invalid Argument" error.
Efficiency: The provisioning phase now starts the moment the network is ready, reducing total deployment time in CI compared to a fixed sleep.
Fail-Fast: If a routable IP is not assigned within the timeout period (e.g., DHCP failure), the waiter exits with an error, providing a clear failure point in Jenkins rather than a generic SSH timeout.

What does this PRs

Updated the connection block in the terraform_data.provisioning resource to use a strict filter for the host attribute.

Logic: The new logic iterates through all addresses reported by the VM's first network interface and excludes any string starting with fe80 (IPv6 Link-Local) or 169.254 (IPv4 Link-Local).

Result: The provisioner now only attempts to connect to routable Global IPv6 or IPv4 addresses.

Safety: Removed the fallback to 127.0.0.1. If no routable address is found, the host now evaluates to null. This prevents the provisioner from "masking" the failure by attempting to SSH into the local runner/bastion host.

16:06:02  │ Error: file provisioner error
16:06:02  │ 
16:06:02  │   with module.build_validation_module.module.server[0].module.server.module.host.terraform_data.provisioning[0],
16:06:02  │   on /home/jenkins/workspace/manager-4.3-qe-mi-validation-sles/results/sumaform/backend_modules/libvirt/host/main.tf line 278, in resource "terraform_data" "provisioning":
16:06:02  │  278:   provisioner "file" {
16:06:02  │ 
16:06:02  │ timeout - last error: dial tcp [fe80::a8b2:93ff:fe02:3d1]:22: connect:
16:06:02  │ invalid argument
16:06:02  ╵
script returned exit code 1

Depends on SUSE/susemanager-ci#1934

maximenoel8 · 2026-03-12T03:54:30Z

    type           = "pty"
    target_port    = "0"
    target_type    = "serial"
-    source_host    = null


Why those lanes were remove ?

Answer:
They are redundant defaults that can occasionally cause validation warnings or clutter in modern OpenTofu/Terraform providers

srbarrios · 2026-03-12T07:03:32Z

Instead of forcing IPv4, I would like first to understand and have an explanation of why using IPv6 fails that particular environment.

Copilot

Pull request overview

Adjusts the libvirt host provisioning logic to prefer an IPv4 address for the initial SSH/provisioner connection, addressing failures when Terraform attempts to connect via an IPv6 link-local (fe80::/10) address.

Changes:

Update connection.host selection to prioritize IPv4 addresses and avoid fe80:: link-local IPv6 addresses.
Consolidate multiple remote-exec provisioners into a single provisioner with sequential commands.
Minor cleanups/formatting adjustments in backend_modules/libvirt/host/main.tf.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Bischoff · 2026-03-12T09:36:20Z

Instead of forcing IPv4, I would like first to understand and have an explanation of why using IPv6 fails that particular environment.

Linux kernel would always prefer IPv6 over IPv4 when there is a choice. So it's expected to use IPv6 whenever possible. It should work.

I would like to understand better the situation too.

Bischoff

Please do not merge this, at least for now.

From what I see after some quick debugging, there is a problem at network level I need to solve.

maximenoel8 · 2026-03-12T10:18:02Z

Instead of forcing IPv4, I would like first to understand and have an explanation of why using IPv6 fails that particular environment.

I updated the PR description

maximenoel8 · 2026-03-12T10:44:08Z

Moving it to draft, the new version is not working yet.

maximenoel8 · 2026-03-12T10:48:01Z

Ok, the changes are working

Bischoff · 2026-03-12T10:58:28Z

Ok, the changes are working

please change the title(s), we are not forcing ipv4 anymore

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

srbarrios · 2026-03-13T06:58:40Z

Instead of using external scripts, maybe we can use the same mechanism of local-exec + qemu-monitor-command (but with different call) that I made on my PR for the libvirt provider example.

See:

# Post-provisioning: Automatic SSH Port Forwarding 
 # Maps host port 2222 to guest port 22 via QEMU monitor command (HMP) 
 resource "null_resource" "auto_ssh_port" { 
   # Ensure the VM is created before attempting port mapping 
   depends_on = [libvirt_domain.ubuntu_vm] 
  
   # Force execution on every 'apply' to maintain the mapping 
   triggers = { 
     always_run = timestamp() 
   } 
  
   provisioner "local-exec" { 
     command = <<-EOT 
       echo "Waiting for VM to initialize..." 
       sleep 10 
       virsh -c qemu:///session qemu-monitor-command ubuntu-vm --hmp "hostfwd_add hostnet0 tcp::2222-:22" 
       echo "SSH access available at: ssh ubuntu@localhost -p 2222" 
     EOT 
   } 
 }

maximenoel8 · 2026-03-25T01:16:10Z

Instead of using external scripts, maybe we can use the same mechanism of local-exec + qemu-monitor-command (but with different call) that I made on my PR for the libvirt provider example.

See:

# Post-provisioning: Automatic SSH Port Forwarding 
 # Maps host port 2222 to guest port 22 via QEMU monitor command (HMP) 
 resource "null_resource" "auto_ssh_port" { 
   # Ensure the VM is created before attempting port mapping 
   depends_on = [libvirt_domain.ubuntu_vm] 
  
   # Force execution on every 'apply' to maintain the mapping 
   triggers = { 
     always_run = timestamp() 
   } 
  
   provisioner "local-exec" { 
     command = <<-EOT 
       echo "Waiting for VM to initialize..." 
       sleep 10 
       virsh -c qemu:///session qemu-monitor-command ubuntu-vm --hmp "hostfwd_add hostnet0 tcp::2222-:22" 
       echo "SSH access available at: ssh ubuntu@localhost -p 2222" 
     EOT 
   } 
 }

Thanks for the reference @srbarrios!

I looked at your example in dmacvicar/terraform-provider-libvirt#1288. The qemu-monitor-command approach works well for simple port forwarding, but I think the use cases are different enough to justify keeping the external scripts here. Here's my reasoning:

	Inline `local-exec` (your example)	External scripts (this PR)
Wait strategy	Static `sleep 10`	Polling loop with retries
Failure behaviour	Silent (continues regardless)	Fail-fast with clear error
Readability	Fine for simple commands	Complex logic better kept out of HCL heredocs
Testability	Hard to test in isolation	Scripts can be run/debugged independently

The core difference is that a static sleep isn't reliable enough for our CI environment, DHCP timing varies, and we need to know early if a routable IP never appears rather than getting a generic SSH timeout later.

Embedding a multi-step polling loop with grep/awk/virsh error handling inside a HCL heredoc would work, but it becomes difficult to read and debug. Keeping the logic in wait_for_ip.sh means it can be run standalone against any domain for troubleshooting.

…provisioning

…provisioning.

…ying on Terraform state.

…y IP retrieval logic.

NamelessOne91 · 2026-04-16T14:09:05Z

I am not sure if this is strictly related, but I remember that when we updated the terraform-libvirt-provider we had similar issues around IPv6 and link-local addresses being used. Especially for retail.

Technically, the provider we build in OBS should be patched to avoid considering an interface ready if that's the only available address.
See https://build.opensuse.org/projects/systemsmanagement:sumaform/packages/terraform-provider-libvirt/files/ipv6.patch?expand=1

Can you double check that:

we are indeed using a provider binary installed from our repo and RPM. Not one pulled from the public registry.
if that DEBUG log pops up and there's a match with the IP causing troubles

maximenoel8 requested a review from a team as a code owner March 12, 2026 03:50

maximenoel8 commented Mar 12, 2026

View reviewed changes

srbarrios requested a review from Copilot March 12, 2026 07:04

Copilot started reviewing on behalf of srbarrios March 12, 2026 07:04 View session

Copilot AI reviewed Mar 12, 2026

View reviewed changes

Comment thread backend_modules/libvirt/host/main.tf Outdated

Comment thread backend_modules/libvirt/host/main.tf Outdated

srbarrios reviewed Mar 12, 2026

View reviewed changes

Comment thread backend_modules/libvirt/host/main.tf

Bischoff self-requested a review March 12, 2026 09:49

Bischoff requested changes Mar 12, 2026

View reviewed changes

maximenoel8 force-pushed the force_ipv4 branch from 2767291 to 3996852 Compare March 12, 2026 10:16

maximenoel8 commented Mar 12, 2026

View reviewed changes

Comment thread backend_modules/libvirt/host/main.tf Outdated

maximenoel8 force-pushed the force_ipv4 branch from 5f41428 to 6ec2bd6 Compare March 12, 2026 10:35

maximenoel8 marked this pull request as draft March 12, 2026 10:44

maximenoel8 force-pushed the force_ipv4 branch from 6ec2bd6 to 44a8395 Compare March 12, 2026 10:45

maximenoel8 marked this pull request as ready for review March 12, 2026 10:48

maximenoel8 force-pushed the force_ipv4 branch from 44a8395 to 52c344d Compare March 12, 2026 10:53

maximenoel8 changed the title ~~Force ipv4~~ During first ssh connection, wait for routable ip Mar 12, 2026

srbarrios requested a review from Copilot March 13, 2026 06:05

Copilot started reviewing on behalf of srbarrios March 13, 2026 06:05 View session

Copilot AI reviewed Mar 13, 2026

View reviewed changes

srbarrios reviewed Mar 13, 2026

View reviewed changes

Comment thread backend_modules/libvirt/host/main.tf Outdated

maximenoel8 force-pushed the force_ipv4 branch 2 times, most recently from 1181955 to f10a0a8 Compare March 18, 2026 20:58

maximenoel8 added 26 commits April 16, 2026 23:19

Try force ipv4 in libvirt host resource configuration

6cf9f16

Improve syntax for IPv4 selection in libvirt host resource configuration

72e3e7a

Skip local links in libvirt host resource configuration

9ed4e08

Add a loop to wait for libvirt domain to report a routable IP before …

4f625e5

…provisioning

Increase timeout for waiting for libvirt domain IP to be routable

b428d5f

Try device random

f888578

Increase more

e911d8f

Try rng.xls

1c5b82e

Try using tcp

dddfc29

Use env variable for hypervisor URI

9186c2f

Use env variable for hypervisor URI

7a0f285

Parse the hypervisor

a32e138

Reduce timeout for waiting for

937a96b

Check for hypervisor and virsh availability before waiting for IP

fba4618

Add other non routable IP

9bf2f6f

Read the IP from the file written by wait_for_ip.sh and store it for …

7d2592b

…provisioning.

Use file to store and reference IP from wait_for_ip.sh output

f3433ca

Use file to store and reference IP from wait_for_ip.sh output

07349fb

Doesn't work revert.

1b0de43

fix regex to exclude non-routable IP addresses

e8f374e

Remove lease check and simplify IP retrieval logic.

050fc62

Try to use data external to read IP from wait_for_ip.sh and avoid rel…

2f1dbaa

…ying on Terraform state.

Data is not order correctly

18c769e

Update with review. Remove unnecessary conditional checks and simplif…

070ac49

…y IP retrieval logic.

Update the regex

76f4350

update regex

8a0bd80

maximenoel8 force-pushed the force_ipv4 branch from 00a7683 to 8a0bd80 Compare April 16, 2026 11:19

Try fix

6422c46

maximenoel8 closed this Apr 20, 2026

Conversation

maximenoel8 commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Why the loop

What does this PRs

Depends on SUSE/susemanager-ci#1934

Uh oh!

maximenoel8 Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

srbarrios commented Mar 12, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Bischoff commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Bischoff left a comment

Choose a reason for hiding this comment

Uh oh!

maximenoel8 commented Mar 12, 2026

Uh oh!

Uh oh!

maximenoel8 commented Mar 12, 2026

Uh oh!

maximenoel8 commented Mar 12, 2026

Uh oh!

Bischoff commented Mar 12, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

srbarrios commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

maximenoel8 commented Mar 25, 2026

Uh oh!

NamelessOne91 commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

maximenoel8 commented Mar 12, 2026 •

edited

Loading

Bischoff commented Mar 12, 2026 •

edited

Loading

srbarrios commented Mar 13, 2026 •

edited

Loading

NamelessOne91 commented Apr 16, 2026 •

edited

Loading